Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?

نویسندگان

  • Mikel L. Forcada
  • Felipe Sánchez-Martínez
  • Miquel Esplà-Gomis
  • Lucia Specia
چکیده

We propose a simple, linear-combination automatic evaluation measure (AEM) to approximate post-editing (PE) effort. Effort is measured both as PE time and as the number of PE operations performed. The ultimate goal is to define an AEM that can be used to optimize machine translation (MT) systems tominimize PE effort, but without having to perform unfeasible repeated PE during optimization. As PE effort is expected to be an extensive magnitude (i.e., one growing linearly with the sentence length andwhichmay be simply added to represent the effort for a set of sentences), we use a linear combination of extensive and pseudo-extensive features. One such pseudo-extensive feature, 1–BLEU times the length of the reference, proves to be almost as good a predictor of PE effort as the best combination of extensive features. Surprisingly, effort predictors computed using independently obtained reference translations perform reasonably close to those using actual post-edited references. In the early stage of this research and given the inherent complexity of carrying out experiments with professional post-editors, we decided to carry out an automatic evaluation of the AEMs proposed rather than a manual evaluation tomeasure the effort needed to post-edit the output of anMT system tuned on these AEMs. The results obtained seem to support current tuning practice using BLEU, yet pointing at some limitations. Apart from this intrinsic evaluation, an extrinsic evaluation was also carried out in which the AEMs proposed were used to build synthetic training corpora for MT quality estimation, with results comparable to those obtained when training with measured PE efforts. © 2017 PBML. Distributed under CC BY-NC-ND. Corresponding author: [email protected] Cite as: Mikel L. Forcada, Felipe Sánchez-Martínez, Miquel Esplà-Gomis, Lucia Specia. Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?. The Prague Bulletin of Mathematical Linguistics No. 108, 2017, pp. 183–195. doi: 10.1515/pralin-2017-0019.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Perception vs Reality: Measuring Machine Translation Post-Editing Productivity

This paper presents a study of user-perceived vs real machine translation (MT) post-editing effort and productivity gains, focusing on two bidirectional language pairs: English— German and English—Dutch. Twenty experienced media professionals post-edited statistical MT output and also manually translated comparative texts within a production environment. The paper compares the actual post-editi...

متن کامل

Perception vs Reality: Measuring Machine Translation Post-Editing Productivity

This paper presents a study of user-perceived vs real machine translation (MT) post-editing effort and productivity gains, focusing on two bidirectional language pairs: English— German and English—Dutch. Twenty experienced media professionals post-edited statistical MT output and also manually translated comparative texts within a production environment. The paper compares the actual post-editi...

متن کامل

Can Statistical Post-Editing with a Small Parallel Corpus Save a Weak MT Engine?

Statistical post-editing has been shown in several studies to increase BLEU score for rule-based MT systems. However, previous studies have relied solely on BLEU and have not conducted further study to determine whether those gains indicated an increase in quality or in score alone. In this work we conduct a human evaluation of statistical post-edited output from a weak rule-based MT system, co...

متن کامل

Optimizing Chinese Word Segmentation for Machine Translation Performance

Previous work has shown that Chinese word segmentation is useful for machine translation to English, yet the way different segmentation strategies affect MT is still poorly understood. In this paper, we demonstrate that optimizing segmentation for an existing segmentation standard does not always yield better MT performance. We find that other factors such as segmentation consistency and granul...

متن کامل

Multi-source Neural Automatic Post-Editing: FBK's participation in the WMT 2017 APE shared task

Previous phrase-based approaches to Automatic Post-editing (APE) have shown that the dependency of MT errors from the source sentence can be exploited by jointly learning from source and target information. By integrating this notion in a neural approach to the problem, we present the multi-source neural machine translation (NMT) system submitted by FBK to the WMT 2017 APE shared task. Our syst...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017